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DATA PREPARATION FOR INTERACTIVE ELECTRONIC PROGRAM GUIDES 



Janet Greco 

Info media S.A., Luxembourg 



ABSTRACT 

This paper defines the role of a clearing house providing TV listings to Electronic Program Grade service providers. 
Tt provides an overview of the work that Infomedia S.A. does in terms of collection, processing, formatting and 
consistently applying program genre classifications to listings across a group of channels. The clearing house acts as 
the filter through which the data can be delivered in many different user-specific formats, ranging from print 
publishers' needs to digital Electronic Program Guide providers, including DVB Service Information. What services 
do such companies provide and what standards exist or still need to be defined? How do they assist service providers 
with the DVB-defined "Service Information" data stream for EPGs? And how does the concept translate also to other 
types of information-based interactive applications? 



INTRODUCTION 

Databases are obviously the foundation for all 
interactive applications. Yet many people involved in 
the creation of Electronic Program Guides (EPGs) for 
digital and other transmission methods assume that 
the data for EPGs is readily available in a consistent 
database format for easy inclusion into the DVB 
"Service Information" data stream. 

The work involved in getting the broadcasters ' 
original schedule listings into a consistent database 
format is actually a considerable one. And since EPGs 
impose a new requirement to provide late program 
changes to the data stream in "real-time" - there is 
still much work to be done. 

Infomedia has been in the market of preparing and 
distributing TV program listings for the past five 
years. The idea behind Infomedia was simple from the 
start. We aimed to provide a centralised electronic 
database for publishers and others who needed to 
access updated TV program listings.. 

The service we provide is and always has been 
available online, and has provided data formats 
tailored to our clients needs. 

♦ Electronic and centralised database 

♦ Focus - European TV listings and media news 

♦ Online - Accessed electronically by publishers 

♦ Formats - customised to clients' needs 



The purpose of this paper is to acquaint you with the 
work that we do, in order to highlight the realities of 
how the collection and data processing work is done. 
The idea is to hopefully give you a better sense of the 
detail involved in the creation of EPGs or Electronic 
Program Guides. 

BACKGROUND 

When I founded Infomedia five years ago, it was 
obvious to me that the way to work was not by hiring 
an army of typists to collect and re-key the data. But, 
to access, somehow, an "electronic" file which had 
already been prepared in the marketing department of 
a TV channel. 

When I was working in TV, most channels sent out 
printed guides by mail, and program changes by fax or 
telex. It was a costly and tedious operation 

Over the course of the last five years, things have 
obviously changed quite a lot. For one thing, we 
increased the number of channels we processed 
sixfold to now around 200 European channels. 

But the most notable trend we witnessed during the 
early nineties, was the implicit acceptance by the 
channels that electronic distribution was of course the 
way to distribute listings. This became clear as the 
number of electronic bulletin boards which were set 
up by individual channels increased. 
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Bulletin boards eased the workload of the channels, 
hut didn't necessarily help the end user - though it was 
a start. It had previously been difficult enough to 
empty and retype all the bulky envelopes arriving 
with the schedules, and manage the late changes 
arriving by fax. Now, professional users were 
expected to separately access individual e-mail or 
electronic bulletin board systems, with separate 
passwords, and entry routines, for every channel 
which they needed to publish. Moreover, different 
electronic file formats were provided, and, the content 
quality different from channel to channel 

Professional end-users include: 

♦ newspapers and magazines 

♦ press agencies 

♦ consumer online and internet guides 

♦ cable operators' teletext services 

♦ digital electronic program guides 

Irtfomedia's work is focused on the preparation of 
standardised data. The work we do enables the output 
of customised formats for a collective group of 
channels - a process which we have learned to 
automate quite expertly. We provide the raw material 
for the creation of all types of TV guides. 

DATA COLLECTION 

This process involves collecting, processing and 
formatting TV program data which is provided to us 
by the channels in many different ways. 

♦ floppy disks 

♦ modem 

♦ e-mail 

♦ online 

As a listings company, we are unique in that more 
than 80% of the data which we collect arrives, in the 
first place, in electronic form. But every electronic 
file received, from all the various sources, arrives in a 
different format. 

♦ 80% arrives electronically 

♦ at least 25 different incoming software formats 

♦ different data content for each channel 

♦ different presentation formats for each channel 

♦ no database files - only flat text files 

Even though Info media receives an enormous number 
of electronic files, our work isn't magically simplified 
as a result. We soon realised that we would get a 
multitude of different file formats, depending on the 
different types of softwares in use at each of the 
channels. 



In addition, almost every channel is providing to us 
flat text files, or files in strange formats (like 
spreadsheets), rather than database format files. In 
addition, the content and data presentation formats 
vary widely. But our function, as a clearing house, 
would still be required even if every channel was able 
to send us a more structured database file, with all the 
fields completely and accurately filled in. 

Some Database Fields 

♦ Date: scheduled date of the program 

♦ Time: expressed as 24 hour clock 

♦ Title: scheduled title 

♦ Synopsis: description of the contents of the 
program, excluding: 

+ Country: country of production 

♦ Year of production: year of production 

♦ Original Language Title: actual original language 
title 

4 Episode Title: scheduled episode title 

+ Original Episode Title: actual original language title 

♦ VPS: VPS programming time and date 

♦ Live: Yes/No 

4 Stereo: Yes/No (excludes two-tone) 
4 Two-tone (Zweikanalton); Yes/No 
+ Encryption: Yes/No - program is encrypted 
4 Language; language of program 
4 First showing: Yes/No - is first appearance on 
channel 

4 Last showing: Yes/No - last appearance on channel 

♦ Pay per view: Yes/No - scheduled event is p-p-v 

4 Is umbrella: Yes/No - is "umbrella" title for a group 
of programs 

4 Channel rating: quality rating assigned by the 
channel 

4 Channel category: genre of program, according to 

the channel 
4 Actor List: list of actors (without roles) 

When we got to know all the different ways of 
producing the data in-house at the channels, we knew 
that the format, field lengths, content, and types of 
databases used would always be different from one 
company to the next, and that rigid quality control 
would always be in order. 

UNIVERSAL GENRE CLASSIFICATION 
SYSTEM (UGCS) 

Another reason why a "clearing house" is needed is to 
play the gatekeeper role is for the consistent 
application of genres across all channels. 

♦ consistently applied genres 

♦ conversion tables 

♦ maximum flexibility 

♦ language and country specific 
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In 1993 Infomedia began its first detailed research 
into the area of program genre classifications, 
knowing all too well that a universal standard did not 
yet exist. So we created one ourselves. 

Our Universal Genre Classification System was the 
result of culling together different industry attempts, 
by a wide range of European media institutions, to 
create a system that would work on a European level. 

Given that each national culture has its own 
peculiarities for describing its program, we created 
instead an internal standard which we apply to all the 
programs in our database. So in the same way that we 
standardise the data first, before offering customised 
outputs, we apply our own internal classification 
standard to every program. 

UGCS - Three levels of classification 

♦ "Category" (ie; factual, advertising, fiction) 

♦ "Type" (ie; news, magazine, film, series) 

♦ "Content" (ie, adult, satire, jazz, politics, bowling) 

UGCS breaks down every program into three levels of 
description. The first, "category" is divided into 12 
headings. The second, "Type", is directly relevant to 
the first category, while the third level, "content" is 
divided into hundreds of content descriptors. 

What is unique about this structure is that we can 
output the genre in any language, or group any 
combination of content, to get the results our clients 
want. 

Conversion tables for language-specific output 

We do this by prograrnming special output formats, 
using conversion tables. 

Example 

♦ Category called "Action Film'* is required by client 

♦ Language of genre is Dutch 

Conversion table: If:"Category"="Fiction", 
'Type"="Film", and "Content w =Action+Adventure+ 
Disaster+Karate+War, Then Output="actiefilm". 

Once set up, the recipient can automatically retrieve 
the data in the output format of their choice. 

With UGCS, we ensure that the data can be 
interactively searched, with consistent results. 

UPDATES IN REAL TIME 

EPGs impose a new requirement to provide late 
program changes to the data stream in "real-time". 



As explained earlier, the main source of our data from 
within the channels is still the press office. This is the 
department which has always traditionally been 
responsible for the dissemination of the detailed 
program schedules. 

Often, the basic information - time and titles only - 
arc transferred to the press departments from 
scheduling in order that they may create the detailed 
printed guide - with actors and directors names, and so 
on. 

Today in cases where an individual channel has an 
electronic distribution service, or even an Internet site, 
the files are periodically refreshed, but with different 
frequencies, depending on the individual channel. 

A clearing house for listings, takes a centralised 
approach to updating the schedules. By offering a 
constantly updated centralised source, we eliminate 
the need for individual publishers to replicate the 
updating work. 

Real time updates are a big challenge for us as well as 
for the channels. The traditional press never needed 
updates in 'real-time \ But for EPGs such updates are 
critical. We are not aware of any general broadcaster 
in Europe that is really prepared for the transmission 
of 'real-time* updates, appropriately formatted for 
EPGs. 

That is a challenge we are exploring with the 
channels, to see if there is a better way to capture a 
data stream, which can be processed into the 
recipient's requested format, classified and converted 
into the requested genre, and transmitted, 
appropriately coded in DVB -Service Information (SI) 
format 

For digital broadcasters, the transmission of the basic 
SI now/next data stream is a mandatory requirement. 
The technical people involved in the creation of the SI 
standard naturally assumed that databases would feed 
the SI and extended SI data streams. But the 
consistent data streams are not readily available. It is 
the work of Infomedia to prepare standardised 
database format files for Electronic Program Guides. 

The need for consistent data is obvious, especially to 
anyone well acquainted with databases. But the 
diversity of sources raises a profound question for all 
types of interactive, and data broadcasting, 
applications. Where is the data coming from? 

Despite our progress, there is still much work to be 
done. We have a lot of channels on the way, and 
technologies such as the V-chip are going to further 
require standardisation and codification of data. 



What kinds of companies will collate and standardise 
the content and presentation formats? Electronic data 
clearing houses, such as Infomedia, will. But the 
channels have to play their part too. 

Effective program promotion is critical to the 
commercial success of all the new channels, 
especially in this digital age. And it is up to the 
channels as well as the digital bouquet providers to 
promote their programs. But, ultimately, it is the 
individual channel who is at risk if accurate, complete 
information is not available in the collective data 
stream. 

The channels must make it easier for everyone who 
needs this information, to get it. There are many other 
kinds of TV guides such as printed magazines and 
newspapers, which are, and have always been, 
thriving here in Europe. 

But there are a variety of electronic TV guides too: 
on-screen cable TV guides, teletext guides, internet 
TV Guides, and "EPGs" - most talked about as part of 
digital bouquets, but also, the digital terrestrial ones. 

All of these different TV Guides require a 
standardised, consistent data stream of listings, and 
EACH digital EPG requires a differently formatted 
data stream based on DVB-ST. 

So the clearing house role has an important future, not 
just for program listings, and their attendant photos 
and videos, but for all kinds of information-based 
interactive, and data broadcasting, applications. 

But right now, it's complicated enough just to collect 
and process the text, even with all our automation 
routines. 

Infomedia has a lot more work ahead, but we are in a 
unique position to clarify some of the more detailed 
aspects of the data distribution, genre classification, 
and updating process. 
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